Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Coqui TTS #59

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Add support for Coqui TTS #59

wants to merge 19 commits into from

Conversation

Cabeda
Copy link

@Cabeda Cabeda commented Apr 2, 2024

As the title says this PR adds a new provider supporting the Coqui TTS.

The default model, Tacotron2, works very similar to EdgeTTS although it only has a single voice option for now. The power of this provider is the possibility of supporting multiple open TTS models with some very powerful like jenny.

Another interesting feature is voice dubbing with the likes of XTTS V2. There's a bug on sentences longer than 400 tokens for now though. To support voice dubbing I've added a folder with 3 voice samples and defaulted to the male one. Additionally, in this mode multiple languages are supported. As the options are different than the ones on --language I've added a new option named --coqui_language.

For this version the provider supports the same audio formats as edgeTTS thanks to pydub.

Note: To run coqui TTS it will always download the AI model to run. This can go from a few MB to more than 1 GB

@Cabeda
Copy link
Author

Cabeda commented Apr 4, 2024

@p0n1 do you have time to give your thoughts on this PR?

@p0n1
Copy link
Owner

p0n1 commented Apr 4, 2024

@p0n1 do you have time to give your thoughts on this PR?

Hi @Cabeda Thank you for the great work. I just had a surgery and am still recovering at hospital. Will review the code whenever I feel better.

@Cabeda
Copy link
Author

Cabeda commented Apr 4, 2024

No probs! Hope for the best 💪🏼

@kelvin-homann
Copy link

Have you tried building the Docker image from the docker file using this? I checked out your repository but apparently its missing gcc and the rust compiler. I think another image is needed to install TTS in the docker image

Copy link
Collaborator

@Bryksin Bryksin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you would add a link to some guide on how to set Coqui in the README file so I could set it up and test it before merging.



def get_tts_provider(config) -> BaseTTSProvider:
if config.tts == TTS_AZURE:
from audiobook_generator.tts_providers.azure_tts_provider import AzureTTSProvider
from audiobook_generator.tts_providers.azure_tts_provider import \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no functional change, just cosmetics, not needed

return AzureTTSProvider(config)
elif config.tts == TTS_OPENAI:
from audiobook_generator.tts_providers.openai_tts_provider import OpenAITTSProvider
from audiobook_generator.tts_providers.openai_tts_provider import \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no functional change, just cosmetics, not needed

return OpenAITTSProvider(config)
elif config.tts == TTS_EDGE:
from audiobook_generator.tts_providers.edge_tts_provider import EdgeTTSProvider
from audiobook_generator.tts_providers.edge_tts_provider import \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no functional change, just cosmetics, not needed

@@ -94,23 +94,23 @@ def handle_args():
help='''
Speaking rate of the text. Valid relative values range from -50%%(--xxx='-50%%') to +100%%.
For negative value use format --arg=value,
'''
''',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is last argument in function, comma not required

)

edge_tts_group.add_argument(
"--voice_volume",
help='''
Volume level of the speaking voice. Valid relative values floor to -100%%.
For negative value use format --arg=value,
'''
''',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is last argument in function, comma not required

)

edge_tts_group.add_argument(
"--voice_pitch",
help='''
Baseline pitch for the text.Valid relative values like -80Hz,+50Hz, pitch changes should be within 0.5 to 1.5 times the original audio.
For negative value use format --arg=value,
'''
''',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is last argument in function, comma not required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants